-
Notifications
You must be signed in to change notification settings - Fork 16
feat: add S3 write support #71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
3c12ccf to
00c533b
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
I don't think it is necessary for this PR, but the ObjectStore may be a good future choice (here it would let you unify the local and s3 code path and make it easier to add future gcs or azure support).
It may be worth setting up a test for this in CI using MinIO or another S3 alternative as otherwise I don't think there are any tests here to validate the behaviour.
|
@prantogg Does this work with the |
james-willis
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. I think Dewey's comment about objectstore makes a lot of sense though.
Thanks @paleolimbot! Agreed on On the MinIO CI test - my thinking was that since |
|
Adding a small fix for S3 error propagation for things like token expiry. |
There's nothing in this PR to indicate that |
d28fb32 to
c0550a7
Compare
c0550a7 to
534a46c
Compare
I see your point @paleolimbot! Thanks for that. Just added MinIO tests |
This PR addresses #70 - enables direct S3 streaming for generated data, eliminating local storage requirements for large-scale dataset generation. S3 URIs in
--output-dirare automatically detected and routed to a streaming multipart upload implementation.--output-dir s3://bucket/prefix/for all output formats (Parquet, CSV, TBL) and zone tablesAWS_ACCESS_KEY_ID,AWS_SECRET_ACCESS_KEY,AWS_REGION,AWS_SESSION_TOKEN, etc.) viaAmazonS3Builder::from_env()